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1 . A system for performing a fast Fourier transform on N ordered inputs in n stages 
comprising: 

a non-final stage calculating means for repetitively performing in-place butterfly 
calculations forn-1 stages; 
5 a final stage calculating means for performing a final stage of butterfly 

calculations including: 

a first loop means for performing a portion of the final stage butterfly 
calculations, the first loop means iterating on a table of first loop index values consisting 
of values that bit-reverse into themselves, the first loop means including control logic to 
10 select inputs for a set of butterfly calculations based on the first loop index values, 

performing the set of butterfly calculations, and storing butterfly calculation outputs in 
shuffled order in place of the selected inputs to result in a correct ordering of transform 
outputs; and 

a second loop means for performing a remaining portion of the final stage 
15 butterfly calculations, the second loop means iterating on a table of second loop index 
value pairs consisting of two values that bit-reverse into each other, the second loop 
means including control logic to select inputs for two sets of butterfly calculations based 
on the two second loop index pair values respectively, performing two sets of butterfly 
calculations, and storing butterfly calculation outputs from a first one of the two sets of 
20 butterfly calculations in shuffled order in place of the inputs selected for a second one of 
the two sets of butterfly calculations and storing butterfly calculation outputs from the 
second one of the two sets of butterfly calculations in shuffled order in place of the inputs 
selected for the first one of the two sets of butterfly calculations to result in a correct 
ordering of transform outputs. 

25 

2. The system of claim 1, wherein the final stage calculating means performs all 
butterfly calculations as radix-4 butterflies having four inputs and four outputs. 

3. The system of claim 2, wherein N is a power of two. 

30 
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4. The system of claim 3, wherein the non-final stage calculating means performs a 
first stage of radix-8 butterfly calculations followed by n-2 stages of radix-4 butterfly 
calculations. 

5 5. The system of claim 3, wherein the first loop means iterates through a list of first 
loop index values between 0 and N/16-1 that bit reverse into themselves. 

6. The system of claim 5, wherein the first loop means includes control logic for 
selecting four groups of four consecutive inputs for each first loop iteration, the inputs 

10 being selected by transforming the first loop index value into four input indices by 
multiplying the first loop index value by four and successively adding N/4 to result in 
four input indices, each group of four consecutive inputs being selected beginning with 
one input index, the four groups of four consecutive inputs being representable as a 4 X 4 
matrix. 

15 

7. The system of claim 6, wherein the first loop means further includes control logic 
for performing four radix-4 butterfly calculations in each first loop iteration, one butterfly 
calculation being performed on each group of four consecutive inputs, the four radix-4 
butterfly calculations generating four groups of four outputs, the outputs being 

20 representable as a 4 X 4 matrix. 

8. The system of claim 7, wherein the first loop means further includes control logic 
to store the outputs in place of the inputs in shuffled order, the shuffled order resulting 
from a 4 X 4 matrix transposition and subsequent swapping of two inner columns. 

25 

9. The system of claim 8, wherein control logic of the first loop means shuffles the 
order of the four groups of four inputs to the radix-4 butterfly calculations so as to 
generate outputs in shuffled order. 
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10. The system of claim 8, wherein control logic of the first loop means shuffles the 
order of the four groups of four outputs from the radix-4 butterfly calculations before 
storing. 

5 11. The system of claim 3, wherein the second loop means iterates through a list of 
second loop index value pairs between 0 and N/16-1 that bit reverse into each other. 

12. The system of claim 1 1, wherein the second loop means includes control logic for 
selecting two sets of four groups of four consecutive inputs for each second loop 

10 iteration, the first set of inputs being selected by transforming a first value of a second 
loop index value pair into four input indices by multiplying the first value of the second 
loop index value pair by four and successively adding N/4 to result in four input indices, 
each group of four consecutive inputs in the first set of inputs being selected beginning 
with one input index, the four groups of four consecutive inputs in the first set being 

15 representable as a 4 X 4 matrix, the second set of inputs being selected by transforming a 
second value of the second loop index value pair into four input indices by multiplying 
the second value of the second loop index value pair by four and successively adding N/4 
to result in four input indices, each group of four consecutive inputs in the second set of 
inputs being selected beginning with one input index, the four groups of four consecutive 

20 inputs in the second set being representable as a 4 X 4 matrix. 

13. The system of claim 12, wherein the second loop means further includes control 
logic for performing two sets of four radix-4 butterfly calculations in each second loop 
iteration, one butterfly calculation being performed on each group of four consecutive 

25 inputs, the four radix-4 butterfly calculations generating two sets of four groups of four 
outputs, the outputs being representable as two 4X4 matrices. 

14. The system of claim 13, wherein the second loop means further includes control 
logic to store the outputs of each set of four radix-4 butterfly calculations in place of the 

30 inputs to the other set of four radix-4 butterfly calculations in shuffled order, the shuffled 
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order resulting from a 4 X 4 matrix transposition and subsequent swapping of two inner 
columns. 

15. The system of claim 14, further comprising a twiddle factor storage element 

5 storing twiddle factors for application in the butterfly calculations, the twiddle factor 

storage element storing twiddle factors in groups of four, each group having an index and 
the groups being stored in bit reversed order based on the index. 

16. The system of claim 2, wherein the non-final and final stage calculating means 

10 include a four-fold SIMD processor for performing four radix-4 butterfly calculations at a 
time. 

17. In a system for calculating a fast Fourier transform on N ordered input elements 
having a plurality of calculation stages wherein N is a power of two, a method for 

15 performing a final calculation stage comprising: 

performing a first iteration loop, the first iteration loop accepting input elements 
located using index values that bit-reverse into themselves, performing butterfly 
calculations on the inputs, and storing output elements in place of the input elements in a 
shuffled order, the shuffled order resulting in the output elements being ordered in the 

20 same manner as the input elements to the fast Fourier transform, the iteration loop 
iterating through all bit-reversal index values, 

performing a second iteration loop, the second iteration loop accepting input 
elements located using pairs of index values that bit-reverse into each other, performing a 
first set of butterfly calculations using input elements located using a first index value of 

25 a bit-reversal pair, performing a second set of butterfly calculations using input elements 
located using a second index value of a bit-reversal pair, storing output elements from the 
first set of butterfly calculations in place of the input elements to the second set of 
butterfly calculations in shuffled order and storing output elements from the second set of 
butterfly calculations in place of the input elements to the first set of butterfly 

30 calculations in shuffled order, the order of the storing resulting in the output elements 
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being ordered in the same manner as the input elements to the fast Fourier transform, the 
iteration loop iterating through all bit-reversal pair index values. 

18. The method of claim 17, wherein the final stage butterfly calculations are 
5 performed radix-4 butterflies having four inputs and four outputs. 

19. The method of claim 18, wherein performing the first iteration loop includes 
iterating through a list of first loop index values between 0 and N/16-1 that bit reverse 
into themselves. 

10 

20. The method of claim 19, wherein performing the first iteration loop includes 
selecting four groups of four consecutive inputs for each first loop iteration, the inputs 
being selected by transforming the first loop index value into four input indices by 
multiplying the first loop index value by four and successively adding N/4 to result in 

15 four input indices, each group of four consecutive inputs being selected beginning with 
one input index, the four groups of four consecutive inputs being representable as a 4 X 4 
matrix. 

21. The method of claim 20, wherein performing the first iteration loop includes 
20 performing four radix-4 butterfly calculations in each first loop iteration, one butterfly 

calculation being performed on each group of four consecutive inputs, the four radix-4 
butterfly calculations generating four groups of four outputs, the outputs being 
representable as a 4 X 4 matrix. 

25 22. The method of claim 21, wherein performing the first iteration loop includes 
storing the outputs in place of the inputs in shuffled order, the shuffled order resulting 
from a 4 X 4 matrix transposition and subsequent swapping of two inner columns. 

23. The method of claim 22, wherein performing the first iteration loop includes 
30 shuffling the order of the four groups of four inputs to the radix-4 butterfly calculations 
so as to generate outputs in shuffled order. 
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24. The method of claim 23, wherein performing the first iteration loop includes 
shuffling the order of the four groups of four outputs from the radix-4 butterfly 
calculations before storing. 

5 

25. The method of claim 18, wherein performing the second iteration loop includes 
iterating through a list of second loop index value pairs between 0 and N/16-1 that bit 
reverse into each other. 

10 26. The method of claim 25, wherein performing the second iteration loop includes 
selecting two sets of four groups of four consecutive inputs for each second loop 
iteration, the first set of inputs being selected by transforming a first value of a second 
loop index value pair into four input indices by multiplying the first value of the second 
loop index value pair by four and successively adding N/4 to result in four input indices, 

15 each group of four consecutive inputs in the first set of inputs being selected beginning 
with one input index, the four groups of four consecutive inputs in the first set being 
representable as a 4 X 4 matrix, the second set of inputs being selected by transforming a 
second value of the second loop index value pair into four input indices by multiplying 
the second value of the second loop index value pair by four and successively adding N/4 

20 to result in four input indices, each group of four consecutive inputs in the second set of 
inputs being selected beginning with one input index, the four groups of four consecutive 
inputs in the second set being representable as a 4 X 4 matrix. 

27. The method of claim 26, performing the second iteration loop includes 
25 performing two sets of four radix-4 butterfly calculations in each second loop iteration, 
one butterfly calculation being performed on each group of four consecutive inputs, the 
four radix-4 butterfly calculations generating two sets of four groups of four outputs, the 
outputs being representable as two 4X4 matrices. 

30 28. The method of claim 27, performing the second iteration loop includes storing the 
outputs of each set of four radix-4 butterfly calculations in place of the inputs to the other 



22 



EV 324 850 535 US 

set of four radix-4 butterfly calculations in shuffled order, the shuffled order resulting 
from a 4 X 4 matrix transposition and subsequent swapping of two inner columns. 

29. The method of claim 28, further comprising storing twiddle factors for application 
5 in the butterfly calculations, the twiddle factor storage element storing twiddle factors in 

groups of four, each group having an index and the groups being stored in bit reversed 
order based on the index. 

30. The method of claim 18, wherein performing radix-4 butterfly calculations 

10 includes performing four radix-4 butterfly calculations at a time using a four-fold SIMD 
processor. 

31. A computer program product for performing a final calculation stage in a system 
for calculating a fast Fourier transform on N ordered input elements having a plurality of 

15 calculation stages wherein N is a power of two, comprising: 

a computer readable medium containing computer readable code for performing a 
first iteration loop, the first iteration loop accepting input elements located using index 
values that bit-reverse into themselves, performing butterfly calculations on the inputs, 
and storing output elements in place of the input elements in a shuffled order, the shuffled 

20 order resulting in the output elements being ordered in the same manner as the input 

elements to the fast Fourier transform, the iteration loop iterating through all bit-reversal 
index values, 

a computer readable medium containing computer readable code performing a 
second iteration loop, the second iteration loop accepting input elements located using 

25 pairs of index values that bit-reverse into each other, performing a first set of butterfly 
calculations using input elements located using a first index value of a bit-reversal pair, 
performing a second set of butterfly calculations using input elements located using a 
second index value of a bit-reversal pair, storing output elements from the first set of 
butterfly calculations in place of the input elements to the second set of butterfly 

30 calculations in shuffled order and storing output elements from the second set of butterfly 
calculations in place of the input elements to the first set of butterfly calculations in 
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shuffled order, the order of the storing resulting in the output elements being ordered in 
the same manner as the input elements to the fast Fourier transform, the iteration loop 
iterating through all bit-reversal pair index values. 

5 32. The computer program product of claim 3 1 , wherein the final stage butterfly 
calculations are performed radix-4 butterflies having four inputs and four outputs. 

33. The computer program product of claim 32, wherein performing the first iteration 
loop includes iterating through a list of first loop index values between 0 and N/16-1 that 

10 bit reverse into themselves. 

34. The computer program product of claim 33, wherein performing the first iteration 
loop includes selecting four groups of four consecutive inputs for each first loop iteration, 
the inputs being selected by transforming the first loop index value into four input 

15 indices by multiplying the first loop index value by four and successively adding N/4 to 
result in four input indices, each group of four consecutive inputs being selected 
beginning with one input index, the four groups of four consecutive inputs being 
representable as a 4 X 4 matrix. 

20 3 5. The computer program product of claim 34, wherein performing the first iteration 
loop includes performing four radix-4 butterfly calculations in each first loop iteration, 
one butterfly calculation being performed on each group of four consecutive inputs, the 
four radix-4 butterfly calculations generating four groups of four outputs, the outputs 
being representable as a 4 X 4 matrix. 

25 

36. The computer program product of claim 35, wherein performing the first iteration 
loop includes storing the outputs in place of the inputs in shuffled order, the shuffled 
order resulting from a 4 X 4 matrix transposition and subsequent swapping of two inner 
columns. 



24 



EV 324 850 535 US 



37. The computer program product of claim 36, wherein performing the first iteration 
loop includes shuffling the order of the four groups of four inputs to the radix-4 butterfly 
calculations so as to generate outputs in shuffled order. 

5 38. The computer program product of claim 37, wherein performing the first iteration 
loop includes shuffling the order of the four groups of four outputs from the radix-4 
butterfly calculations before storing. 

39. The computer program product of claim 38, wherein performing the second 

10 iteration loop includes iterating through a list of second loop index value pairs between 0 
and N/16-1 that bit reverse into each other. 

40. The computer program product of claim 39, wherein performing the second 
iteration loop includes selecting two sets of four groups of four consecutive inputs for 

15 each second loop iteration, the first set of inputs being selected by transforming a first 
value of a second loop index value pair into four input indices by multiplying the first 
value of the second loop index value pair by four and successively adding N/4 to result in 
four input indices, each group of four consecutive inputs in the first set of inputs being 
selected beginning with one input index, the four groups of four consecutive inputs in the 

20 first set being representable as a 4 X 4 matrix, the second set of inputs being selected by 
transforming a second value of the second loop index value pair into four input indices by 
multiplying the second value of the second loop index value pair by four and successively 
adding N/4 to result in four input indices, each group of four consecutive inputs in the 
second set of inputs being selected beginning with one input index, the four groups of 

25 four consecutive inputs in the second set being representable as a 4 X 4 matrix. 

41. The computer program product of claim 40, performing the second iteration loop 
includes performing two sets of four radix-4 butterfly calculations in each second loop 
iteration, one butterfly calculation being performed on each group of four consecutive 

30 inputs, the four radix-4 butterfly calculations generating two sets of four groups of four 
outputs, the outputs being representable as two 4X4 matrices. 
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42. The computer program product of claim 41, performing the second iteration loop 
includes storing the outputs of each set of four radix-4 butterfly calculations in place of 
the inputs to the other set of four radix-4 butterfly calculations in shuffled order, the 

5 shuffled order resulting from a 4 X 4 matrix transposition and subsequent swapping of 
two inner columns. 

43. The computer program product of claim 42, further comprising storing twiddle 
factors for application in the butterfly calculations, the twiddle factor storage element 

10 storing twiddle factors in groups of four, each group having an index and the groups 
being stored in bit reversed order based on the index. 

44. The computer program product of claim 32, wherein performing radix-4 butterfly 
calculations includes performing four radix-4 butterfly calculations at a time using a four- 

15 fold SIMD processor. 

45. A computer system for system for performing a fast Fourier transform on N 
ordered inputs in n stages comprising: 

a four-fold SIMD processor; 

20 a non-final stage calculating means for repetitively performing in-place butterfly 

calculations for n-1 stages, the butterfly calculations being perform in groups of four 
simultaneous butterflies on the four-fold SIMD processor; 

a final stage calculating means for performing a final stage of butterfly 
calculations on the four-fold SIMD processor including: 

25 a first loop means for performing a portion of the final stage butterfly 

calculations, the first loop means iterating on a table of first loop index values consisting 
of values that bit-reverse into themselves, the first loop means including control logic to 
select inputs for four simultaneous radix-4 butterfly calculations based on the first loop 
index values, performing the four simultaneous radix-4 butterfly calculations, and storing 

30 butterfly calculation outputs in shuffled order in place of the selected inputs to result in a 
correct ordering of transform outputs; and 
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a second loop means for performing a remaining portion of the final stage 
butterfly calculations, the second loop means iterating on a table of second loop index 
value pairs consisting of two values that bit-reverse into each other, the second loop 
means including control logic to select inputs for two sets of four simultaneous radix-4 

5 butterfly calculations based on the two second loop index pair values respectively, 

performing the two sets of four simultaneous radix-4 butterfly calculations, and storing 
butterfly calculation outputs from a first one of the two sets of butterfly calculations in 
shuffled order in place of the inputs selected for a second one of the two sets of butterfly 
calculations and storing butterfly calculation outputs from the second one of the two 

10 groups of butterfly calculations in shuffled order in place of the inputs selected for the 
first one of the two groups of butterfly calculations to result in a correct ordering of 
transform outputs. 



46. A method for calculating a fast Fourier transform comprising: 
15 accessing an ordered set of N transform inputs; 

performing one or more non- final stages of butterfly calculations, each of the one 
or more stages accepting N inputs and transforming them into stage outputs and storing 
the outputs in place of the outputs in a bit-reversed order; 

performing a final stage of butterfly calculations using the outputs of a previous 
20 stage of butterfly calculations as inputs to the final stage of butterfly calculations, the 
final stage of butterfly calculations including: 

(a) a first loop iterated through a list of first loop index values between 0 and 
N/16-1 that bit reverse into themselves, the first loop including: 

(f) selecting four groups of four consecutive inputs, the inputs being 
25 selected by transforming the first loop index value into four input indices by 

multiplying the first loop index value by four and successively adding N/4 to 
result in four input indices, each group of four consecutive inputs being selected 
beginning with one input index, the four groups of four consecutive inputs being 
representable as a 4 X 4 matrix; 
30 (ii) performing four radix-4 butterfly calculations, one calculation for 

each group of four inputs; and 
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(iii) storing the outputs in place of the inputs in shuffled order, the 
shuffled order resulting from a 4 X 4 matrix transposition and subsequent 
swapping of two inner columns; and 

(b) a second loop iterated through a list of second loop index pair values, the 
5 second loop index pair values being each pair of values between 0 and N/16-1 that bit 
reverse into each other, the including loop including: 

(f) selecting two sets of four groups of four consecutive inputs, the 
inputs being selected by transforming each value in the second loop index pair 
into four input indices by multiplying each second loop index pair value by four 
10 and successively adding N/4 to result in two sets of four input indices, each group 

of four consecutive inputs being selected beginning with one input index, the two 
sets of four groups of four consecutive inputs being representable as two 4X4 
matrices; 

(ii) performing two sets of four radix-4 butterfly calculations, one 
15 calculation for each group of four inputs; and 

(iii) storing the outputs of each set of four radix-4 butterfly calculations 
in place of the inputs to the other set of four radix-4 butterfly calculations in 
shuffled order, the shuffled order resulting from a 4 X 4 matrix transposition and 
subsequent swapping of two inner columns; 

20 wherein following the second loop, outputs from the fast Fourier transform are 

correctly ordered. 
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